[Previous] [Next]

Aggregate Data Types

The native data types we have examined so far have been simple. While useful in their own right, they can also serve as building blocks to form aggregate data types. In this section, we examine this concept more closely.

User-Defined Types

A user-defined type (UDT) is a compound data structure that holds several variables of simpler data types. Before you can use a UDT variable, you must first define its structure, using a Type directive in the declaration section of a module:

Private Type EmployeeUDT
    Name As String
    DepartmentID As Long
    Salary As Currency
End Type

UDTs can be declared as Private or Public. Under Visual Basic 5 or previous versions, only UDTs declared in BAS modules can be Public. In Visual Basic 6, all modules except forms can include Public UDT definitions, provided that the project type isn't Standard EXE and that the class isn't Private. For more information, see Chapter 16.

Once you have defined a Type structure, you can create variables of that type as you would do with any Visual Basic native type. You can then access its individual items using the dot syntax:

Dim Emp As EmployeeUDT
Emp.Name = "Roscoe Powell"
Emp.DepartmentID = 123

UDTs can contain both conventional and fixed-length strings. In the former case, the structure in memory holds just a pointer to the actual data, whereas in the latter case the strings' characters are stored in the same block as the other items of the UDT structure. This is reflected by the LenB function, which you can use on any UDT variable to learn the number of actual bytes used:

Print LenB(Emp)     ' Prints 16: 4 for Name, regardless of its length +
                    ' 4 for DepartmentID (Long) + 8 for Salary (Currency)

Type structures can also contain substructures, for example:

Private Type LocationUDT
    Address As String
    City As String
    Zip As String
    State As String * 2
End Type
Private Type EmployeeUDT
    Name As String
    DepartmentID As Long
    Salary As Currency
    Location As LocationUDT
End Type

When you access such nested structures, you can resort to the With…End With clause to produce more readable code:

With Emp
    Print .Name
    Print .Salary 
    With .Location
        Print .Address
        Print .City & "  " & .Zip & "  " & .State
    End With
End Type

When you're working with a complex UDT, assigning a value to all its individual components is often a nuisance. Fortunately, since VBA supports functions that return UDTs, you can write support procedures that considerably simplify the job:

Emp = InitEmployee("Roscoe Powell", 123, 80000)
...
Function InitEmployee(Name As String, DepartmentID As Long, _
    Salary As Currency) As EmployeeUDT
    InitEmployee.Name = Name 
    InitEmployee.DepartmentID = DepartmentID
    InitEmployee.Salary = Salary
End Function

Visual Basic lets you copy one UDT to another UDT with the same structure using a regular assignment, as in the following code:

Dim emp1 As EmployeeUDT, emp2 As EmployeeUDT
...
emp2 = emp1

Arrays

Arrays are ordered sets of homogeneous items. Visual Basic supports arrays made up of elementary data types. You can build one-dimensional arrays, two-dimensional arrays, and so on, up to 60 dimensions. (I never met a programmer who bumped into this limit in a real application, though.)

Static and dynamic arrays

Basically, you can create either static or dynamic arrays. Static arrays must include a fixed number of items, and this number must be known at compile time so that the compiler can set aside the necessary amount of memory. You create a static array using a Dim statement with a constant argument:

' This is a static array. 
Dim Names(100) As String

Visual Basic starts indexing the array with 0. Therefore, the preceding array actually holds 101 items.

Most programs don't use static arrays because programmers rarely know at compile time how many items you need and also because static arrays can't be resized during execution. Both these issues are solved by dynamic arrays. You declare and create dynamic arrays in two distinct steps. In general, you declare the array to account for its visibility (for example, at the beginning of a module if you want to make it visible by all the procedures of the module) using a Dim command with an empty pair of brackets. Then you create the array when you actually need it, using a ReDim statement:

' An array defined in a BAS module (with Private scope)
Dim Customers() As String
...
Sub Main()
    ' Here you create the array.
    ReDim Customer(1000) As String
End Sub

If you're creating an array that's local to a procedure, you can do everything with a single ReDim statement:

Sub PrintReport()
    ' This array is visible only to the procedure.
    ReDim Customers(1000) As String
    ' ...
End Sub

If you don't specify the lower index of an array, Visual Basic assumes it to be 0, unless an Option Base 1 statement is placed at the beginning of the module. My suggestion is this: Never use an Option Base statement because it makes code reuse more difficult. (You can't cut and paste routines without worrying about the current Option Base.) If you want to explicitly use a lower index different from 0, use this syntax instead:

ReDim Customers(1 To 1000) As String

Dynamic arrays can be re-created at will, each time with a different number of items. When you re-create a dynamic array, its contents are reset to 0 (or to an empty string) and you lose the data it contains. If you want to resize an array without losing its contents, use the ReDim Preserve command:

ReDim Preserve Customers(2000) As String

When you're resizing an array, you can't change the number of its dimensions nor the type of the values it contains. Moreover, when you're using ReDim Preserve on a multidimensional array, you can resize only its last dimension:

ReDim Cells(1 To 100, 10) As Integer
...
ReDim Preserve Cells(1 To 100, 20) As Integer    ' This works.
ReDim Preserve Cells(1 To 200, 20) As Integer    ' This doesn't.

Finally, you can destroy an array using the Erase statement. If the array is dynamic, Visual Basic releases the memory allocated for its elements (and you can't read or write them any longer); if the array is static, its elements are set to 0 or to empty strings.

You can use the LBound and UBound functions to retrieve the lower and upper indices. If the array has two or more dimensions, you need to pass a second argument to these functions to specify the dimension you need:

Print LBound(Cells, 1)   ' Displays 1, lower index of 1st dimension
Print LBound(Cells)      ' Same as above
Print UBound(Cells, 2)   ' Displays 20, upper index of 2nd dimension
' Evaluate total number of elements.
NumEls = (UBound(Cells) _ LBound(Cells) + 1) * _
    (UBound(Cells, 2) _ LBound(Cells, 2) + 1)

Arrays within UDTs

UDT structures can include both static and dynamic arrays. Here's a sample structure that contains both types:

Type MyUDT
    StaticArr(100) As Long
    DynamicArr() As Long
End Type
...
Dim udt As MyUDT
' You must DIMension the dynamic array before using it.
ReDim udt.DynamicArr(100) As Long
' You don't have to do that with static arrays.
udt.StaticArr(1) = 1234

The memory needed by a static array is allocated within the UDT structure; for example, the StaticArr array in the preceding code snippet takes exactly 400 bytes. Conversely, a dynamic array in a UDT takes only 4 bytes, which form a pointer to the memory area where the actual data is stored. Dynamic arrays are advantageous when each individual UDT variable might host a different number of array items. As with all dynamic arrays, if you don't dimension a dynamic array within a UDT before accessing its items, you get an error 9—"Subscript out of range."

Arrays and variants

Visual Basic lets you store arrays in Variant variables and then access the array items using the Variant variable as if it were an array:

ReDim Names(100) As String, var As Variant
' Initialize the Names array (omitted).
var = Names()        ' Copy the array into the Variant.
Print var(1)         ' Access array items through the Variant.

You can even create an array of Variant elements on the fly using the Array function and store it in a Variant variable:

' Arrays returned by the Array() function are zero-based.
Factorials = Array(1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800)

Likewise, you can pass an array to a procedure that expects a Variant parameter and then access the elements of the array through that parameter:

' A polymorphic function that sums the values in any array
Function ArraySum(arr As Variant) As Variant
    Dim i As Long, result As Variant
    For i = LBound(arr) To UBound(arr)
        result = result + arr(i)
    Next
    ArraySum = result
End Function

The most interesting feature of the preceding routine is that it works correctly with any type of numeric one-dimensional array. It even works with String arrays, but in that case you get the concatenation of all items, not their sum. This procedure is extremely powerful and reduces the amount of code you have to write to deal with different kinds of arrays. But you should be aware that accessing array items through a Variant parameter noticeably slows down the execution. If you need the best performance, write specific routines that process specific types of arrays.

You can also pass a multidimensional array to a routine that expects a Variant parameter. In this case, you can still access the array elements through the Variants, but if you don't know at compile time how many dimensions the array has, your routine has to determine that number before proceeding. You can get this value using a trial-and-error approach:

' This routine returns the number of dimensions of the array
' passed as an argument, or 0 if it isn't an array.
Function NumberOfDims(arr As Variant) As Integer
    Dim dummy as Long
    On Error Resume Next
    Do
        dummy = UBound(arr, NumberOfDims + 1)
        If Err Then Exit Do
        NumberOfDims = NumberOfDims + 1
    Loop
End Function

TIP
It's perfectly legal to use the function name inside a function's code as if it were a local variable, as the previous code snippet does. Often this technique lets you save a local variable and a final assignment before exiting the routine, which indirectly makes your code run slightly faster.

Here's a modified ArraySum routine that uses NumberOfDims and works with both one- and two-dimensional arrays:

Function ArraySum2(arr As Variant) As Variant
    Dim i As Long, j As Long, result As Variant
    ' First check whether we can really work with this array.
    Select Case NumberOfDims(arr)
        Case 1       ' One-dimensional array
            For i = LBound(arr) To UBound(arr)
                result = result + arr(i)
            Next
        Case 2       ' Two-dimensional array
            For i = LBound(arr) To UBound(arr)
                For j = LBound(arr, 2) To UBound(arr, 2)
                    result = result + arr(i, j)
                Next
            Next
        Case Else   ' Not an array, or too many dimensions
            Err.Raise 1001, , "Not an array or more than two dimensions"
    End Select
    ArraySum2 = result
End Function

Often, if a Variant contains an array, you don't know the basic type of that array in advance. The VarType function returns the sum of the vbArray constant (decimal 8192), plus the VarType of the data included in the array. This lets you test that the array passed to a routine is of a given type:

If VarType(arr) = (vbArray + vbInteger) Then 
    ' Array of integers
ElseIf VarType(arr) = (vbArray + vbLong) Then
    ' Array of Longs
ElseIf VarType(arr) And vbArray Then
    ' An array of another type (just tests a bit)
End If

You can also test whether a Variant holds an array using the IsArray function. When a Variant variable holds an array, the TypeName function appends a pair of empty parentheses to its result:

Print TypeName(arr)      ' Displays "Integer()"

As I've explained, you can either assign an array to a Variant variable or you can pass an array as a Variant parameter of a procedure. While the two operations look very similar, they're substantially different. To execute an assignment, Visual Basic makes a physical copy of the array. As a result, the Variant variable doesn't point to the original data but to the copy; from this point on, all the manipulations you do through the Variant variable don't affect the original array. Conversely, if you call a procedure and pass an array as a Variant parameter, no data is physically copied and the Variant simply works as an alias of the array. You can reorder array items or modify their values, and your changes are immediately reflected in the original array.

Assigning and returning arrays

Visual Basic 6 adds two important features to arrays. First, you can perform assignments between arrays. Second, you can write procedures that return arrays. You can assign arrays only of the same type and only if the target is a dynamic array. (The latter condition is necessary because Visual Basic might need to resize the target array.)

ReDim a(10, 10) As Integer
Dim b() As Integer
' Fill the a array with data  (omitted).
b() = a()        ' This works!

It's no surprise that native assignment commands are always faster than the corresponding For…Next loops that copy one item at a time. The actual increment in speed heavily depends on the data type of the arrays and can vary from 20 percent to 10 times faster. A native assignment between arrays also works if the source array is held in a Variant. Under Visual Basic 4 and 5, you could store an array in a Variant, but you couldn't do the opposite—that is, retrieve an array stored in a Variant variable and store it back in an array of a specific type. This flaw has been fixed in Visual Basic 6:

Dim v As Variant, s(100) As String, t() As String
' Fill the s() array   (omitted).
v = s()        ' Assign to a Variant.
t() = v        ' Assign from a Variant to a dynamic string array.

You often use the capacity to assign arrays to build functions that return arrays. Notice that pair of brackets at the end of the first line in the following procedure:

Function InitArray(first As Long, Last As Long) As Long()
    ReDim result(first To Last) As Long
    Dim i As Long
    For i = first To Last
        result(i) = i
    Next
    InitArray = result
End Function

The new capability of returning arrays lets you write highly versatile array routines. Visual Basic 6 itself includes a few new string functions—namely Join, Split, and Filter—that rely on it. (You'll find more about these new string functions in Chapter 5). Here are two examples of what you can do with this intriguing feature:

' Returns a portion of a Long array
' Note: fails if FIRST or LAST are not valid
Function SubArray(arr() As Long, first As Long, last As Long, _
    newFirstIndex As Long) As Long()
    Dim i As Long
    ReDim result(newFirstIndex To last _ first + newFirstIndex) As Long
    For i = first To last 
        result(newFirstIndex + i - first) = arr(i)
    Next
    SubArray = result
End Function

' Returns an array with all the selected items in a ListBox
Function SelectedListItems(lst As ListBox) As String()
    Dim i As Long, j As Long
    ReDim result(0 To lst.SelCount) As String
    For i = 0 To lst.ListCount - 1
        If lst.Selected(i) Then
            j = j + 1
            result(j) = lst.List(i)
        End If
    Next
    SelectedListItems = result
End Function

Byte arrays

Byte arrays are somewhat special because Visual Basic lets you directly assign strings to them. In this case, Visual Basic performs a direct memory copy of the contents of the string. Because all Visual Basic 5 and 6 strings are Unicode strings (two bytes per character), the target array is redimensioned to account for the actual string length in bytes (which you can determine using the LenB function). If the string contains only characters whose code is in the range 0 through 255 (the case if you work with Latin alphabets), every other byte in the array will be 0:

Dim b() As Byte, Text As String
Text = "123"
b() = Text      ' Now b() contains six items: 49 0 50 0 51 0

It's also possible to perform the opposite operation:

Text = b()

This special treatment reserved for Byte arrays is meant to ease the conversion from old Visual Basic 3 applications that use strings to hold binary data, as I explained in "The Byte Data Type" section, earlier in this chapter. You can exploit this feature to create blindingly fast string routines when you have to process each individual character in a string. For example, see how quickly you can count all the spaces in a string:

' NOTE: this function might not work with non-Latin alphabets.
Function CountSpaces(Text As String) As Long
    Dim b() As Byte, i As Long
    b() = Text
    For i = 0 To UBound(b) Step 2
        ' Consider only even-numbered items.
        ' Save time and code using the function name as a local variable.
        If b(i) = 32 Then CountSpaces = CountSpaces + 1
    Next
End Function

The preceding routine is about three times faster than a regular routine, which uses Asc and Mid$ functions to process all the characters in the argument, and even faster if you turn on the Remove Array Bounds Check compiler optimization. The only drawback of this technique is that it isn't Unicode-friendly because it considers only the least significant byte in each 2-byte character. If you plan to convert your application to some language that relies on Unicode—Japanese, for example—you should stay clear of this optimization technique.

Inserting and deleting items

Some of the most common operations you perform on arrays are inserting and deleting items, shifting all the remaining elements toward higher indices to make room or toward lower indices to fill the "hole" a deletion has left. You usually do this with a For…Next loop, and you can even write generic array procedures that work with any type of array (with the usual restrictions about arrays of UDTs and fixed-length strings that can't be passed to a Variant parameter):

Sub InsertArrayItem(arr As Variant, index As Long, newValue As Variant)
    Dim i As Long
    For i = UBound(arr) - 1 To index Step -1
        arr(i + 1) = arr(i)
    Next
    arr(index) = newValue
End Sub

Sub DeleteArrayItem(arr As Variant, index As Long)
    Dim i As Long
    For i = index To UBound(arr) - 1
        arr(i) = arr(i + 1)
    Next
    ' VB will convert this to 0 or to an empty string.
    arr(UBound(arr)) = Empty
End Sub

If your application works intensively with arrays, you might find that an approach based on For…Next loops is too slow. In some cases, you can considerably speed up these operations by using the RtlMoveMemory API function, which many Visual Basic programmers know under its popular alias name, CopyMemory.1 This function lets you move a block of bytes from one memory address to another memory address and works correctly even if the two areas partially overlap. Here's the code that inserts a new item in an array of Longs:

Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" _
    (dest As Any, source As Any, ByVal numBytes As Long)

Sub InsertArrayItemLong(arr() As Long, index As Long, newValue As Long)
    ' We let VB evaluate the size of each item using LenB().
    If index < UBound(arr) Then
        CopyMemory arr(index + 1), arr(index), _
            (UBound(arr) _ index) * LenB(arr(index))
    End If
    arr(index) = newValue
End Sub

Sub DeleteArrayItemLong(arr() As Long, index As Long)
    If index < UBound(arr) Then
 CopyMemory arr(index), arr(index + 1), _
            (UBound(arr) _ index) * LenB(arr(index))
    End If
    arr(index) = Empty
End Sub

CAUTION
The prerequisite for using the CopyMemory API function is that data must be stored in contiguous memory locations, so you absolutely can't use it to insert or remove elements in String and Object arrays, nor in arrays of UDTs that contain conventional strings, object references, or dynamic arrays. (Fixed-length strings and static arrays in UDTs are OK, though.)

Note that while you can't use the preceding routines for arrays other than Long arrays, the statements in the procedure body can be recycled for another data type without any change, thanks to the use of the LenB function. Therefore, you can derive new array functions that work for other data types by simply modifying the procedure's name and its parameter list. For example, you can create a new function that deletes an item in a Double array by editing just the first line of code (shown in boldface):

Sub DeleteArrayItemDouble(arr() As Double, index As Long)
    ' All the other statements here are the same as in DeleteArrayItemLong
    ' ...
End Sub

Sorting

Sorting is an operation that you often perform on arrays. As you probably know, there are dozens of different sort algorithms, each one with its strengths and weaknesses. I found that the Shell Sort algorithm works well in most cases, and I've prepared a generic routine that sorts any one-dimensional array of a data type compatible with the Variant type, either in ascending or descending order:

Sub ShellSortAny(arr As Variant, numEls As Long, descending As Boolean)
    Dim index As Long, index2 As Long, firstItem As Long
    Dim distance As Long, value As Variant
    ' Exit if it is not an array.
    If VarType(arr) < vbArray Then Exit Sub
    firstItem = LBound(arr)
    ' Find the best value for distance.
    Do
        distance = distance * 3 + 1
    Loop Until distance > numEls
    ' Sort the array.    
    Do
        distance = distance \ 3
        For index = distance + firstItem To numEls + firstItem - 1
            value = arr(index)
            index2 = index
            Do While (arr(index2 - distance) > value) Xor descending
                arr(index2) = arr(index2 - distance)
                index2 = index2 - distance
                If index2 - distance < firstItem Then Exit Do
            Loop
            arr(index2) = value
        Next
    Loop Until distance = 1
End Sub

Arrays of arrays

While you can create two-dimensional arrays in Visual Basic, their structure isn't really flexible for at least two reasons: All rows in the array must have the same number of elements, and you can use ReDim Preserve to change the number of columns but you can't add new rows. The first point is especially important because it often leads you to declare an array that's far too large for your needs, thus allocating a lot of memory that in most cases remains largely unused. You can solve both problems using a structure known as an array of arrays.

The technique is conceptually simple: Since you can store an array in a Variant variable, you can build an array of Variants, where each item holds an array. Each subarray—a row of this pseudo-array—can hold a different number of elements, and you don't need to use more memory than is strictly necessary.

Click to view at full size.

Here's an example, based on an imaginary PIM (Personal Information Manager) program. In this program, you need to keep track of a list of appointments for each day of the year. The simplest solution would be to use an array in which each row corresponds to a day in the year and each column to a possible appointment. (For the sake of simplicity, let's assume that each appointment's data can be held in a string.)

ReDim apps(1 To 366, 1 To MAX_APPOINTMENTS) As String

Of course, you now have the problem of setting a reasonable value for the MAX_APPOINTMENTS symbolic constant. It should be high enough to account for all possible appointments in a day but not too high because you might be wasting a lot of memory without any real reason. Let's see how the array of arrays technique can help us save memory without posing any artificial limit to your application:

' A module-level variable
Dim apps(1 To 366) As Variant

' Add an appointment for a given day.
Sub AddNewAppointment(day As Integer, description As String)
    Dim arr As Variant
    If IsEmpty(apps(day)) Then
        ' This is the first appointment for this day.
        apps(day) = Array(description)
    Else
        ' Add the appointment to those already scheduled.
        arr = apps(day)
        ReDim Preserve arr(0 To UBound(arr) + 1) As Variant
        arr(UBound(arr)) = description
        apps(day) = arr
    End If
End Sub

' Extract all the appointments for a given day.
Sub ListAppointments(day As Integer, lst As ListBox)
    Dim i As Long
    For i = 0 To UBound(apps(1))
        lst.AddItem apps(1)(i)
    Next
End Sub

In this example, I kept the code as simple as possible and used an array of Variant arrays. You could save even more memory if each row of this array were built using an array of a more specific data type (String, in this case). Note the special syntax used to address an item in an array of arrays:

' Change the description for the Nth appointment.
apps(day)(n) = newDescription

Nothing keeps you from extending this concept further, introducing an array of arrays of arrays, and so on. If you're dealing with arrays in which each row can vary considerably in length, this approach is going to save you a lot of memory and, in most cases, improve your overall performance too. A key feature of an array of arrays is that you can process entire rows of your pseudo-array as if they were single entities. For example, you can swap them, replace them, add and delete them, and so on.

' Move the January 1st appointments to January 2nd. 
apps(2) = apps(1)
apps(1) = Empty

Finally, an important advantage of this technique is that you can add new rows without losing the current contents of the array. (Remember that you can use ReDim Preserve on regular arrays only to modify the number of columns, not the number of rows.)

' Extend the appointment book for another nonleap year.
ReDim Preserve apps(1 to UBound(apps) + 365) As Variant

Collections

Collections are objects exposed by the VBA library. They can be used in Visual Basic applications to store groups of related data. In this sense, Collections are similar to arrays, but the similarities stop here because of these substantial differences:

With all these advantages, you might wonder why collections haven't supplanted arrays in the hearts of Visual Basic developers. The main reason is that Collections are slow, or at least they're noticeably slower than arrays. To give you an idea, filling an array of 10,000 Long elements is about 100 times faster than filling a Collection of the same size. Take this into account when you're deciding which data structure best solves your problem.

The first thing you must do before using a Collection is create it. Like all objects, a Collection should be declared and then created, as in the following code:

Dim EmployeeNames As Collection
Set EmployeeNames = New Collection

Or you can declare an auto-instancing collection with one single line of code:

Dim EmployeeNames As New Collection

You can add items to a Collection object by using its Add method; this method expects the value you're adding and a string key that will be associated with that value:

EmployeeNames.Add "John Smith", "Marketing"

where value can be virtually anything that can be stored in a Variant. The Add method usually appends the new value to the collection, but you can decide where exactly you want to store it using either the before argument or the after argument:

' Insert this value before the first item in the collection.
EmployeeNames.Add "Anne Lipton", "Sales"
' Insert this new value after the element added previously.
EmployeeNames.Add value2, "Robert Douglas", ,"Sales"

Unless you have a good reason to store the new value somewhere other than at the end of the Collection, I suggest that you not use the before or after arguments because they slow down the Add method. The string key is optional. If you specify it and there's another item with the same key, the Add method will raise an error 457—"This key is already associated with an element of this collection." (Keys are compared in a case-insensitive way.)

Once you have added one or more values, you can retrieve them using the Item method; this method is the default member of the Collection class, so you can omit it if you want. Items can be read using their numeric indices (as you do with arrays) or their string keys:

' All the following statements print "Anne Lipton".
Print EmployeeNames.Item("Sales")     
Print EmployeeNames.Item(1)
Print EmployeeNames("Sales")
Print EmployeeNames(1)

TIP
If you want to write faster programs, always access a Collection's items using their string keys rather than their numeric indices. As counterintuitive as it may appear, using string keys is almost always faster than using numeric indices, especially if the Collection has thousands of elements and the one you're interested in isn't near the beginning of it.

If you pass a numeric index that's either negative or greater than the number of items currently in the collection, you get an error code 9—"Subscript out of range" (exactly as if you were acting on a standard array); if you pass a nonexistent string key, you get error code 5—"Invalid procedure call or argument." Curiously, the Collection object doesn't offer a native method to test whether an item actually exists. The only way to learn whether an element is already in a Collection is by setting up an error handler and testing for the existence of that element. Here's a function that does the trick, and you can reuse it with any Collection:

Function ItemExists(col As Collection, Key As String) As Boolean
    Dim dummy As Variant
    On Error Resume Next
    dummy = col.Item(Key)
    ItemExists = (Err <> 5)
End Function

The Count method returns the number of items in the collection:

' Retrieve the last item in the EmployeeNames collection.
' Note that collections are one-based.
Print EmployeeNames.Item(EmployeeNames.Count)

You can remove items from a Collection object using the Remove method; this method accepts either a numeric index or a string key:

' Remove the Marketing Boss.
EmployeeNames.Remove "Marketing"

If the key doesn't exist, the Collection object raises an error 5—"Invalid procedure call or argument." Collections don't offer a native way to remove all the items in a single operation, so you're forced to write a loop. Here's a general function that does it for you:

Sub RemoveAllItems(col As Collection)
    Do While col.Count
        col.Remove 1
    Loop
End Sub

TIP
A faster way to remove all the items in a Collection is to destroy the Collection object itself by setting it to Nothing or to another fresh, new instance:

' Both these lines destroy the current contents
' of the Collection.
Set EmployeeNames = Nothing
Set EmployeeNames = New Collection

This approach works only if there isn't any other object variable pointing to the Collection object, however. If you aren't sure of this, the only safe way to remove all items is the loop I showed you previously.

Finally, as I mentioned before, Collections don't allow you to modify the value of an item. If you want to change the value of an item, you must first delete it and then add a new item. Here's generic routine that uses this technique:

' INDEX can be either a numeric or a string value.
Sub ReplaceItem(col As Collection, index As Variant, newValue As Variant)
    ' First remove that item (exits with error if it doesn't exist).
    col.Remove index
    ' Then add it again.
    If VarType(index) = vbString Then
        ' Add a new item with the same string key.
        col.Add newValue, index
    Else
        ' Add a new item in the same position (without any key).
        col.Add newValue, , index
    End If
End Sub

Iterating on Collection objects

Since you can address items using their numeric indices, you can loop on all the elements of a Collection object using a regular For…Next loop:

' Load the contents of a Collection into a ListBox control.
Dim i As Long
For i = 1 To EmployeeNames.Count
    List1.AddItem EmployeeNames(i)
Next

While this code works, Collection objects offer another, better way to perform the same task, based on the For Each…Next loop:

Dim var As Variant
For Each var in EmployeeNames
    List1.AddItem var
Next

Notice that the loop's controlling variable (var, in this example) must be of type Variant so that it can host any value that had been added to the Collection. The only exception to this rule is when you're sure that the Collection contains only a given class of objects (forms, controls, or user-defined objects), in which case you can use a controlling variable of that specific type:

' If the Customers collection includes only references
' to individual Customer objects
Dim cust As Customer
For Each cust In Customers
    List1.AddItem cust.Name
Next

Using a controlling variable of a specific object type usually offers better performance than a generic Variant or Object variable. Iterating on the elements of a collection using a For Each…Next loop is generally faster than a regular For…Next loop because the latter forces you to refer to individual elements using their numeric indices, which is a relatively slow operation.

Working with Collection objects

Collections are very flexible structures and are useful in many cases for solving simple but recurring programming jobs. The very nature of Collection objects suggests that you use them whenever you need to associate a key with a value for a faster retrieval. The following routine builds on the fact that Collections accept only unique keys to filter out all duplicated entries in an array of any Variant-compatible type:

' Filter out all duplicate entries in any Variant-compatible array.
' On entry, NUMELS should be set to the number of items to be examined.
' On exit, NUMELS holds the number of nonduplicate items.
Sub FilterDuplicates(arr As Variant, numEls As Long)
    Dim col As New Collection, i As Long, j As Long
    On Error Resume Next
    j = LBound(arr) - 1
    For i = LBound(arr) To numEls
        ' Add a dummy zero value, but use the array's value as the key.
        col.Add 0, CStr(arr(i))
        If Err = 0 Then
            j = j + 1
            If i <> j Then arr(j) = arr(i)
        Else
            Err.Clear
        End If
    Next
    ' Clear all remaining items.
    For i = j + 1 To numEls: arr(i) = Empty: Next
    numEls = j
End Sub

In some cases, you might feel limited by the fact that Collection objects can't hold UDT values, so you don't know what to do when you need to store multiple values associated with the same key. One solution is to use objects instead of UDTs, but employing this technique is often overkill because you rarely want to add a class module to your project just to store multiple values in a Collection. A much better solution is to build arrays on the fly and store them as items in the Collection. A practical example is shown below.

' Store Employees data in a Collection.
Dim Employees As New Collection
' Each item is made up of (Name, Dept, Salary).
Employees.Add Array("John", "Marketing", 80000), "John"
Employees.Add Array("Anne", "Sales", 75000), "Anne"
Employees.Add Array("Robert", "Administration", 70000), "Robert"
...

' List all employees' names.
Dim var As Variant
For Each var in Employees
    Print var(0)       ' Item 0 is the employee's name.
Next
' Where does Anne work?
Print Employees("Anne")(1)
' How much does Robert earn?
Print Employees("Robert")(2)

Of course, you can make these compound structures as complex as you need to. For example, each Employees element might hold a Collection of other pieces of information, such as how many hours each employee has worked for a given customer:

Dim Employees As New Collection, Customers As Collection
' Each item is made up of (Name, Dept, Salary, Customers).
Set Customers = New Collection
Customers.Add 10, "Tech Eight, Inc"
Customers.Add 22, "HT Computers"
Employees.Add Array("John", "Marketing", 80000, Customers), "John"
' Start with a fresh collection each time.
Set Customers = New Collection
Customers.Add 9, "Tech Eight, Inc"
Customers.Add 44, "Motors Unlimited"
Employees.Add Array("Anne", "Sales", 75000, Customers), "Anne"
' etc.  ....

This complex structure lets you quickly and elegantly solve a number of problems and answer a few interesting questions:

' Is John working with customer "HT Computers"?
Dim hours As Long, var As Variant
On Error Resume Next
hours = Employees("John")(3)("HT Computers")
' HOURS holds zero if the above statement failed.

' How many hours has Anne worked for external customers?
hours = 0 
For Each var In Employees("Anne")(3)
    hours = hours + var
Next

' How many hours have been devoted to customer "Tech Eight, Inc"?
On Error Resume Next
hours = 0
For Each var In Employees
    hours = hours + var(3)("Tech Eight, Inc")
Next

As you can see, collections are highly flexible data structures. I suggest that you explore their capabilities in depth, and I'd bet that you'll find yourselves using them more often than you anticipated.

Dictionary Objects

Dictionary objects are new to the Visual Basic language. Technically speaking, however, they don't belong to Visual Basic as Collections do, nor do they belong to the VBA language. Rather, they're exposed by an external library, the Microsoft Scripting Library. In fact, to use these objects you must add a reference to the SCRRUN.DLL library (which appears under the name Microsoft Scripting Runtime, as you can see in Figure 4-1). Once you do that, you can press F2 to invoke the Object Browser and explore Dictionary's methods and properties.

Dictionary objects are very similar to Collection objects. Actually, they were originally created to provide VBScript programmers with a Collection-like object. The Dictionary object isn't exclusive to Visual Basic 6; the Scripting Library can be freely downloaded from http://msdn.microsoft.com/scripting/ and used with any Automation-compliant programming language, including Visual Basic 5. Visual Basic 6 installs this library as part of its setup procedure, so you don't have to download and register it separately.

Click to view at full size.

Figure 4-1. Add a reference to the Microsoft Scripting Runtime library in the References dialog box in order to use the Dictionary object.

You'll see in a moment how closely akin Dictionary objects are to Collection objects, so it's easy to illustrate their features by comparing these two. You create a Dictionary object as you do any object, for example using an auto-instancing variable:

Dim dict As New Scripting.Dictionary

In all cases, note that the Scripting prefix is optional, but I suggest you use it in case your References dialog box includes other external libraries that happen to expose a Dictionary object. Using the complete libraryname.classname syntax when declaring object variables is an intelligent way to avoid bugs in the future.

CAUTION
The VBScript-oriented nature of Dictionary objects has somewhat resisted the migration to Visual Basic. All the examples in the Visual Basic 6 manuals are taken as is from VBScript documentation and therefore use a CreateObject function to create the Dictionary. (VBScript doesn't support the New operator.) Besides, all examples store references to Dictionary objects in Variant variables. (VBScript doesn't support specific object variables.)

' This is what VB6 docs report.
Dim dict         ' Variant is VB's default data type.
Set dict = CreateObject("Scripting.Library")

While this code works, you should absolutely avoid it for two reasons: CreateObject is about twice as slow as New, and above all, using a generic Variant variable instead of a more specific variable of type Dictionary adds overhead any time you access the object's properties and methods because you are actually doing late binding instead of early binding. My informal benchmarks show that a specific variable speeds up the code to 30 times faster, and it also delivers more robust applications because all syntactical errors are trapped by the compiler.

You add an item to a Dictionary object using its Add method, as you do with Collection objects. But the order of the two arguments is reversed (first the key and then the item's value), and you can't omit the key or specify the before or after arguments:

dict.Add "key", value

If the Dictionary object contains a value associated with the same string key, an error 457 is raised (the same as that raised by Collection objects). Dictionary objects support the Item member, but there are important differences from Collection objects in the way the Item member is implemented. For Dictionary objects, Item is a read-write property, not a method, and you can reference an element only by using a key (which can be a string or a number), but not through its numeric index in the Dictionary. In other words, you can reference an item only by its key, not by its position:

Print dict("key")             ' Print the current value,
dict("key") = newValue        ' and then modify it.
Print dict(1)                 ' Displays an empty string because
                              ' there's no item with this key.

There's also a third important difference: If the key isn't found in the Dictionary, no error is raised. If your code was trying to read that item, the Dictionary returns an Empty value; if it was assigning a value, another item is added to the Dictionary. In other words, you can add new items without using the Add method:

Print dict("key2")            ' Returns Empty.
dict(key2) = "new value"      ' Adds a new element, and
                              ' no error is raised.

In this respect, Dictionary objects are more akin to PERL associative arrays than to Visual Basic's Collection objects. Like Collections, Dictionaries support the Count property, but you can't use it to set up For…Next loops.

You can remove Dictionary items using the Remove method:

dict.Remove "key"           ' Numeric indices aren't supported.

If the key isn't found in the Dictionary, an error 32811 is raised. (The corresponding message isn't very helpful about the real cause: "Method 'Remove' of object 'IDictionary' failed"). As is not the case for Collections, you can remove all items in a Dictionary object in one shot by using the RemoveAll method:

dict.RemoveAll          ' No need for a loop.

Dictionary objects are also more flexible than Collection objects in that you can modify the key associated with an element by using the Key property:

dict.Key("key") = "new key"

The Key property is write-only, but this isn't a real limitation: It wouldn't make any sense to read the value of a key, since you can reference it only by using the current key value. Dictionary objects expose an Exists method that lets you test whether an element actually exists. You need this method because otherwise you couldn't discern between nonexistent and Empty values:

If dict.Exists("John") Then Print "Item ""John"" exists"

Dictionary objects also expose two methods, Items and Keys, which quickly retrieve all values and keys into an array in one single operation:

Dim itemValues() As Variant, itemKeys() As Variant, i As Long
itemValues = dict.Items    ' Retrieve all values.
itemKeys = dict.Keys       ' Retrieve all keys.
' Put keys and values into a single list.
For i = 0 To UBound(itemValues)
    List1.AddItem itemKeys(i) & " = " & itemValues(i)
Next

The Items and Keys methods are also the only ways to access the elements of a Dictionary object because you can't use either the For…Next loop (because numeric indices are actually interpreted as keys) or the For Each…Next loop. If you don't want to explicitly load items and keys into Variant arrays, however, you can take the following shortcut, based on the fact that Variant arrays do support enumeration through the For Each…Next loop:

Dim key As Variant
For Each key In dict.Keys
    List1.AddItem key & " = " & dict(key)
Next

Interestingly, Keys is also the default method for the Dictionary object, so you can omit it in the preceding code snippet and you end up with a code syntax that makes it appear as if the Dictionary object supported enumeration through the For Each…Next loop:

For Each key In dict
    List1.AddItem key & " = " & dict(key)
Next

The last property of the Dictionary object is the CompareMode property, which states how a Dictionary object compares keys. It can be assigned three values: 0-BinaryCompare (case-sensitive comparisons, the default setting), 1-TextCompare (case-insensitive comparisons), and 2-DatabaseCompare (not supported under Visual Basic). You can assign this property only when the Dictionary object is empty.

Dictionary vs. Collection objects

After this overview, it should be clear that Dictionary objects are more flexible than Collection objects. Their only missing feature is the ability to address items by their numeric indices (which, on the other hand, is one of the slowest operations you can perform on Collection objects). Unless you need this capability, the choice seems clear: Use Dictionary objects whenever you need their flexibility. But remember that you will need to distribute another ancillary file with your application.

Microsoft hasn't revealed how Collection and Dictionary objects are implemented internally, but it seems to me that the Dictionary object is based on a more efficient algorithm than the Collection object. My informal benchmarks show that creating a 10,000-item Dictionary is about 7 times faster than adding the same number of items to an empty Collection object. Reading these items back is about 3 to 4 times faster. This gap decreases when you create larger structures (only 2.5 times faster with 100,000 items), but in general a Dictionary object can be considered faster than a Collection object. The actual speed difference might depend on how keys are distributed, available memory, and other factors. I suggest that you do some benchmarks with your own actual data before opting for one solution or another.